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Symbolic sequences with long-range correlations are ex- 
pected to result in a slow regression to a steady state of en- 
tropy increase. However, we prove that also in this case a 
fast transition to a constant rate of entropy increase can be 
obtained, provided that the extensive entropy of Tsallis with 
entropic index q is adopted, thereby resulting in a new form 
of entropy that we shall refer to as Kolmogorov- Sinai- Tsallis 
(KST) entropy. We assume that the same symbols, either 1 
or —1, are repeated in strings of length /, with the probability 
distribution p{l) oc ■^. The numerical evaluation of the KST 
entropy suggests that at the value /i = 2 a sort of abrupt tran- 
sition might occur. For the values of fi in the range 1 < /x < 2 
the entropic index q is expected to vanish, as a consequence 
of the fact that in this case the average length < I > diverges, 
thereby breaking the balance between determinism and ran- 
domness in favor of determinism. In the region /i > 2 the 
entropic index q seems to depend on /i through the power law 
expression q = {jj, — 2)" with a ~ 0.13 (q ~ 1 with fi > 3). It 
is argued that this phase-transition like property signals the 
onset of the thermodynamical regime at /^ = 2. 

It has been recently pointed out [Q that power law 
spectra are observed in many disciplines of science rang- 
ing from astronomy, geography and physics to electron- 
ics, acoustic, linguistic and music. It is also interesting to 
establish a connection between these observed properties 
and their algorithmic complexity. This is important not 
only from a conceptual point of view l2l : It also might 
result in methods for the detection itself of correlations. 
In this respect, we want to mention the search for correla- 
tions in DNA sequences based on the adoption of entropic 
indicators |§-0. 

It has been remarked S, however, that something in- 
termediate between periodic and chaotic dynamical be- 
havior exists and that suitable tools to analize these pro- 
cesses must be built up. These conclusions are widely 
shared in literature. For instance, also the authors of 
Refs. |i|,|,|lO| as well as those of Ref. @, show that the 
entropy of symbolic sequences in the case of long-range 
correlations exhibits a regression to the condition of con- 
stant Kolmogorov entropy which turns out to be very 
slow. Analogous results are found in many other papers 



]11|^3[ as well as in earlier papers |14| . 

We shall refer ourselves to the Kolmogorov entropy ap- 
plied to the symbolic sequences as metric entropy (ME) 
p5[ to keep it distinct from the Kolmogorov-Sinai en- 
tropy (KSE) [ p^lp^ ]. The two entropies are closely re- 
lated to one another, since both entropies are expressed 
in terms of the Shannon-Gibbs entropy. However, the 
latter, the KSE, refers to individual trajectories and, in 
principle, does not imply any coarse-graining if the as- 
sumption is made that cells and time steps of arbitrar- 
ily small size can be used. The former applies to sym- 
bolic sequences and consequently might be affected by a 
so large coarse-graining process as to lose a direct con- 
nection with the rules, either stochastic or deterministic, 
from which the sequence is generated. This aspect will 
be made more transparent by the discussion of the nu- 
merical experiment described in this paper. 

The main purpose of this paper is that of discussing the 
consequences of expressing the ME in terms of the Tsallis 
entropy |18| rather than of the Shannon entropy. This is 
a form of ME that we shall refer to as Kolmogorov-Sinai- 
Tsallis (KST) entropy. The Tsallis entropy reads 
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Note that this entropy is characterized by the index q 
whose departure from the conventional value q — 1 sig- 
nals the thermodynamic effects of either long-range cor- 
relations in fractal dynamics or the non-local character 
of quantum mechanics p9| . The increasing interest for 
Tsallis' non-extensive entropy is testified by the exponen- 
tially growing list of publications on this hot issue p9 . 

Of remarkable interest for the subject of fractal dy- 
namics is the discovery recently made by Tsallis et al. 
pij that the entropic index q also determines the spe- 
cific analytical form illustrating the trajectory instability. 
Two trajectories, moving from infinitelly close but dis- 
tinct initial conditions, depart from one another with a 
law more general than the exponential prescription. The 
exponential instability is a sort of singularity, namely, a 
special case of a more general, non-exponential, prescrip- 
tion. This important result is based on the generalization 
of the KSE II^Jl^l and consequently of the theorem of 
Pesin Ig^ . Palatella and Grigolini ||l9[ have recently cor- 
roborated the conclusions of Miller and Sarkar |E3| who 



prove that in the quantum case the Von Neumann en- 
tropy is linearly proportional to the KSE. Furthermore, 
the results of these authors have been extended |1^] to 
the case where the quantum expression for the entropy 
(the von Neumann entropy) is expressed in terms of the 
Tsallis prescription. The interesting conclusion is that 
q < 1: Palatella and Grigolini |jl9] argue that this result 
is a reflection of the occurrence of the Anderson localiza- 
tion. 

The present paper is devoted to discussing the con- 
venience of the KST entropy to reveal whether or not 
a symbolic sequence does have or not a thermodynam- 
ical nature. The discussion rests on a key experiment, 
planned for the specific purpose of establishing correla- 
tions in sequences of symbols. The sequence of symbols 
is established as follows. Two computer generators of 
random numbers, x and z, are used. The former gener- 
ates random numbers distributed with equal probability 
in the interval [0, 1] and the latter is the generator of the 
fluctuations z = +1 and z = — 1, with the same statistical 
weight. The uncertainty associated with each drawing of 
the numbers x is 



hx = InWx, 



(2) 



where Wx = l/^x and A^, denotes the resolution of the 
former random generator. The drawing of the numbers 
z is equivalent to tossing a coin, and consequently is as- 
sociated with the uncertainty 



hz = ln2. 



(3) 



Let us immagine now that at regular intervals of time, 
with the time step At = 1, we draw a number x and 
a number z. The uncertainty H(N) grows as a linear 
function of the number of drawings N, 



HiN) ^ N{ln2 + lnWx). 



(4) 



We introduce a deterministic rule into this totally 
stochastic picture. This is done by replacing the vari- 
able x with the variable y related to x by 



y = A[- 



1 



— -!]• 
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The probability distribution of the variable y is given by 



P{y) = (Ai-l): 



(6) 



{A + yy 
Note that the first moment of the variable y is given by 
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This means that the value /i = 2 is a critical point at 
which the first moment of the new variable y diverges. 



The symbolic sequence is obtained by drawing the 
number x first. This number determines the number 
y according to Eq. (||), and fixes the number of sites 
Ny = [y] + l, with [y] denoting the integer part of y, to fill 
with the same symbol (either 1 or —1). Then we draw the 
number z, and we fill these sites either with +1 or with 
— 1 according to whether we get z — 1 or z — —1. Note 
that, according to the treatment of [M, the length of the 
strings with the same symbols (either +1 or —1) is pro- 
portional to the length of the laminar regions generated 
by the nonlinear maps that are currently used to mimic 
turbulent phenomena p5| ] . For this reason we shall refer 
to them as laminar strings. We can thus provide further 
support to our conviction that critical properties have to 
be expected at /i = 2. In fact we notice that the diver- 
gence of Eq. (|^) at /i = 2 implies that the mean length 
of the laminar strings is infinite, and that, consequently, 
once one symbol is known, the chances of guessing cor- 
rectly a large number of symbols coming afterwards are 
high. Perhaps, a more proper way of illustrating the 
region with 1 < // < 2, where all the moments of the dis- 
tribution p{y) of Eq. (pi) diverge, is that of referring to 
it as the region where the balance between randomness 
and determinism is broken and determinism prevails p6[ . 
In conclusion, we think that the deterministic nature of 
this region can be properly denoted by the entropic in- 
dex q = 0. On the other hand, we expect that the region 
/i > 3 is characterized by g = 1. This is so because the 
region /j. > 3 implies that the second moment, as well 
as the first moment of p{y), is finite, thereby ensuring 
the validity of the central limit theorem, and with it, of 
ordinary statistical mechanics. This means that /i > 3 is 
expected to yield q = 1. 

The original uncertainty of Eq. (m is deformed by the 
nonlinear transformation of Eq. (j^ . However, as we have 
seen, this can force q to depart from the usual statistical 
value q = 1 only in the region fi < 3. The region /i < 2 
is expected to yield q = 0. We are thus only left with 
the problem of establishing the dependence of q on /i 
in the region 2 < /x < 3. This can be done evaluating 
numerically the KST entropy as follows. After defining a 
given sequence, we fix a window of length N . Note that 
this length N from now on will be referred to as time. 
Then we move this window along the sequence generated 
according to the rules earlier illustrated. For any position 
of this window we find a given configuration Ai • A2 • 
...» An, where the ^^'s have either the value -fl or the 
value — 1 . We count the number of configurations of the 
same kind obtained moving the window along the chain, 
then we divide the number of these configurations by the 
total number of possible configurations W{N), thereby 
determining the probabilities Pi . Finally we use Eq. dl^) 
to evaluate the entropy corresponding to this window of 
length N. It is convenient to study all this in the specific 
case where the symbolic sequence is generated with no 
correlation among the distinct sites. In this specific case 



W{N) 
obtain 



2^ and, of course, p,{N) = 1/2^. Thus we 



H,iN) 






(8) 



Fig.|i| illustrates the behavior of Hq{N) of Eq. (|) for 
different values of the entropic index q. The middle line 
denotes the behavior corresponding to g = 1. Let us 
identify, therefore, g = 1 with qtrue, namely, the entropic 
index properly reflecting a given statistical condition, the 
total absence of correlations, in this case. Then we see 
that for q < qtrue the time derivative of entropy tends 
to increase upon increase of the time N. We see also 
that if the probing index q is larger than the correct en- 
tropic index, namely, q > qtrue^ the time evolution of 
Hq (N) is characterized by a rate of increase smaller than 
the increase linear in time. It is plausible that the same 
qualitative behavior is present even if qtrue / 1. In fact, 
if we assume this behavior to be valid in general, namely, 
even in the case where qtrue < 1, we predict that the 
entropy growth for g = 1 is slower than that of a linear 
function of time, thus fitting the observation made by 
several authors (see, for instance, the work of Ref. [g]). 
This is an important remark since our main purpose here 
is to apply our statistical analysis to the case where the 
symbolic sequence is characterized by extended correla- 
tions, and consequently the entropic index is expected to 
depart from the normal value qtrue = 1- 

However, before addressing this challenging problem, 
it is convenient to recall some important properties. First 
of all, it is worth noticing that the rules earlier adopted 
are equivalent to those used in recent papers [p7|-p0{ to 
build up sequences that turn out to be statistically equiv- 
alent to the real DNA sequences. The distributions of 
-|-1, corresponding to purines, and of —1, corresponding 
to pirimidines, was actually established adopting a non- 
linear map ||3l| . The nonlinear map adopted, in turn, 
was the same as that widely used in the recent few years 
to generate anomalous diffusion. In a more recent paper 
^4| , it has been shown that these nonlinear maps pro- 
duce effects statistically equivalent to a stochastic gener- 
ator which is, in fact, the same as that earlier illustrated 
as a generator of long-range correlations in the symbolic 
sequences under study in this paper. 

Let us focus now our attention on the fact that any 
finite string Ai •A2 • . . ■•Aj^ can be associated to an erratic 
trajectory moving from the "time" i = 1 to the "time" 
i — N on an one-dimensional lattice. The correspondence 
is established using the following prescription. At the 
time i the random walker makes a jump of unit length 
to the right or to the left according to whether Ai = 
1 or Ai = —1. It is shown |24| that in the case of a 
random walker with correlations infinitely extended in 
time there is a significant probability that the random 
walker might make N steps in the same direction. Thus, 
in a process of diffusion, with all the walkers initially 



concentrated in the same site, the distribution will split 
into two ballistic peaks moving in opposite directions. 
With the increase of N an increasing number of walkers 
belonging to a peak moving in a given direction will make 
jumps in the opposite direction. Thus the intensity of the 
side peaks of the distribution is a decreasing function of 
time, known [E4| to be proportional to the correlation 
function $(fc) =< AiAi+k > / < AiAi >. It is evident 
that the strings Ai • A2 • ... • An with all the A'iS equal 
to either -1-1 or —1, have the same intensity as these side 
peaks, to which these strings are equivalent. For this 
reason we shall refer ourselves to these side strings, with 
the same length as that of the exploring window of size 
N and with all the symbols Ai corresponding to the same 
letter, as border strings. 

For the sake of some preliminary remarks we make the 
simplifying assumption that all the strings but the border 
strings have the same probability p{N) . The dependence 
oi p{N) on N is established by setting the normalization 
condition which yields 



p{N) 



1 - 2U{N) 



}N 



As earlier remarked, according to Ref. J24J 
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Thus, under the assumption of equal probability for all 
the strings but the border strings, we can write 



H,iN) 



2n(7V)« -I- (2^ - 2)i-«(l - 2n(7V))9 - 1 
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with n(Af) given by Eq. ([Tol). We note that, in principle, 
the rules adopted to establish long-range correlations in 
the sequences under study in this paper, make the corre- 
lation function ^{N) read 



^{N) 



A'' 



{A + N)f^' 



(12) 



where (3 = fi — 2. Consequently, the decay of these bor- 
der strings is extremely slow and it dominates the entropy 
time evolution for a long time. On the other hand, for 
times so long as to make the contribution of the central 
part more important than that resulting from the border 
strings, the correct entropic index is given hy q = 1, in 
accordance to the fact that in such a condition the statis- 
tical properties of the sequences become indistinguishable 
from that of totally uncorrelated sequences |2q| . This is 
in line with the fact that the diffusion process pj] re- 
sulting from these rules is characterized by two distinct 
rescaling properties, the ballistic rescaling of the peaks 
and the Levy rescaling of the central part of the diffusion. 
This means that the interesting statistical properties are 



blurred by the presence of the border strings. For this 
reason, we decided to disregard the border strings and to 
set the normahzation condition only on the other strings. 

In principle, if no length limitation were set on the 
analysis of data, it would be possible to derive the cor- 
rect statistical properties by examining suitably large 
windows. However, for the sake of computational sim- 
plicity we set the maximum length of the window to be 
Nmax = 10. On the other hand, as we shall see, the ap- 
proach based on disregarding the border strings makes it 
possible to reveal the effect of correlations on the entropic 
index with sequences of relatively small length. 

It has to be stressed that the detection of the proper 
entropic index becomes more and more difficult as the 
power index /Li comes closer and closer to the critical value 
/i = 2. In fact, the probabilities of given strings of length 
N are closely related to the correlation functions. The 
correlation function $(-/V), for instance, on the basis of 
the Shannon-McMillan-Brciman theorem |3^-|34[|, is the 
probability of a string of length equal to 2. This makes it 
possible to explain why the finite length of the sequence 
yields an error on the numerical evaluation of the prob- 
ability of a given sequence and suggests how to correct 
this error. In fact, the finite length of the sequence causes 
the truncation of the longer strings and, consequently, the 
decay of correlation function ^{N) becomes faster than 
theoretically expected on the basis of the prescription of 
Eq. (P). Thus, rather than expressing the entropic index 
q in terms of the parameter /i corresponding to the pre- 
scription of Eq. (p), we relate q to an effective /i, obtained 
from the numerical evaluation of the correlation function 
$(iV). More precisely, we determine numerically the pa- 
rameter (3 and from it /i = 2 -I- /3. The numerical results 
show that for values of /i « 2.3 or larger, the effective 
power index coincides with the value that theoretically 
should correspond to Eq. (||). 

These numerical expedients make it possible for us to 
bring the determination of g as a function of /i, much 
closer to the critical region fi = 2. The method adopted 
is illustrated by Fig.||. As expected, on the basis of the 
results illustrated by Fig.|l| and concerning the theoretical 
prescription of Eq. (|8|), there exists a crucial value of q, 
which results in a linear dependence of Hq{N) on N. In 
Fig.g, for instance, we see that at /i = /^ = 2.5 the solid 
line, corresponding to g w 0.89 fits very well a straight 
line. 

Using this numerical method to determine q we find 
the interesting results illustrated in Fig.H. On the basis 
of the earlier remarks making plausible that q — at 
/i = 2, we have been led to fit the numerical data with 



for fj. > 2 and 



q=i^^-2r, 



g=l 



(13) 



(14) 



for fJ. > 3. 

We see from Fig.^ that the fitting function of Eq. ( [l^ ) 
results in a satisfactory agreement with the numerical 
result if we set a w 0.13. This means that the critical 
value q = is reached with an infinite derivative, rein- 
forcing our conviction that /i = 2 is a critical point of 
transition to thermodynamics. The disorder in the re- 
gion 2 < /i < 3 is partial, and localized to the transition 
from one laminar string to another. However, this is 
enough to generate a thermodynamic behavior. The tra- 
ditional wisdom would confine thermodynamics to the 
region /i > 3, which is where the conventional central 
limit theorem applies. In a sense this analysis shows that 
thermodynamics is possible also in the region where the 
central limit theorem holds in the generalized form estab- 
lished by Levy [p5| , p6|. It is interesting to remark that 
earlier research work [ ^7| , p8[ has established that the dy- 
namical approach to diffusion, based on the stationary 
assumption on the fluctuations responsible for diffusion, 
is incompatible with the condition /i < 2. In this region a 
diffusion process must rest on a continuous-time random 
walk method implying the breakdown of the stationary 
assumption p^ . The region 2 < /^ < 3 is compatible with 
stationary diffusion even if the diffusion process departs 
from ordinary Brownian diffusion and takes the shape 
of a Levy process ||2J]. Therefore we conclude that the 
stationary diffusion processes have the same regime of 
validity as the non-extensive thermodynamics of Tsal- 
lis. In fact, this paper shows that the new perspective of 
Tsallis extends the regime of validity of thermodynamics 
to regions earlier imagined as being non thermodynamic, 
in this case, to /i < 3. However, thermodynamics, even 
within this new perspective, cannot overcome the border 
ji = 2. In other words, it seems that the Tsallis thermo- 
dynamics has the same regime of validity as the dynamic 
approach to diffusion, which is based on the assumption 
that fluctuations are characterized by a stationary corre- 
lation function P9|] . 

We have seen that the numerical calculations rests on 
both the expedient of adopting jl rather than /i and that 
of neglecting the border strings. The latter method is not 
only an expedient to extend the regime of validity of our 
numerical calculations. It reflects a property that prob- 
ably deserves further studies. In fact the border strings 
correspond to the peaks that appear in the dynamical 
approach to the Levy diffusion. As discussed in Ref. [g4| , 
these peaks are a consequence of the dynamic approach 
and the Levy statistics are recovered only in the time 
asymptotic limit. On the other hand, as pointed out by 
the authors of BqI ■, ^ satisfactory agreement between the 
entropic properties of trajectories and the general proba- 
bilistic arguments of M-E3] is obtained in the long-time 
regime, where the peak intensity tends to vanish. This 
means, in other words, that the peaks seem to be dy- 
namic properties incompatible with the thermodynamic 
treatment, in accordance with the observation made in 



this paper that the emergence of a Hq{N) Hnearly de- 
pendent on N would be blurred by the presence of the 
two border strings. 

We would be tempted to stress that the detection of 
q < 1 is expected on the basis of the theoretical anal- 
ysis made by Lyra and Tsallis pil on the dynamics of 
logistic map. These authors show indeed that the gener- 
alization of the Pesin theorem to the case of the logistic 
map implies g < 1 at the chaos threshold. However, by 
the same token we should conclude that these results dis- 
agree with those of Ref. E3|, which rests, on the contrary, 
on a condition physically much closer to that discussed 
in the present paper. Actually, some caution must be 
exerted in establishing a straight connection between the 
results of this paper and the research work of [^,|j , for 
the reasons pointed out earlier in this paper. Here we 
are dealing with the ME that might imply a so strong 
coarse graining as to lose a close relation with the KST 
[£l| and consequently with the generalization of the im- 
portant theorem of Pcsin |g2[ . This is made evident also 
by the fact that the correlated sequences are here gen- 
erated by a stochastic approach, even if this turns out 
to be equivalent to the adoption of a nonlinear map [ pl[ . 
The statistical analysis in terms of the ME is insensi- 
tive of whether a nonlinear map or a stochastic approach 
is adopted. However, the peaks are dynamic properties 
that in all cases seem to be incompatible with the adop- 
tion of a merely entropic approach. This reinforces the 
need for carrying out the ME analysis by disregarding 
the border strings, as we propose in this paper. 

In summary, this paper sheds light into the breakdown 
of extensivity caused by time correlations. There are at 
the least two important sources of non extensivity: non- 
locality in space Esj and nonlocality in time. For the 
spatial case, and up to now, there is a no clearcut con- 
nection in the literature between q and the critical index 
characterizing spatial correlations (although there is a 
variety of strong indications). For the temporal case this 
manuscript establishes, for the first time, the analogous 
connection between q and /i (see Eq.([l3|) and Eq.([l4|)). 
This is an interesting result and some efforts should be 
made to establish theoretically the critical exponent a. 
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FIG. 1. The KST entropy as a function of A'' in the com- 
pletely uncorrelated case (corresponding to /i = oo). In this 
case the KST entropy is expressed by Eq. (H) . For this reason 
the three curves have been derived from Eq. (H). The upper, 
middle and bottom lines refer to g = 0.9, q ~ 1 and q — 1.1, 
respectively. 



FIG. 2. The KST entropy as a function of N with 
jl = n — 2.5. The three curves have been obtained using 
the numerical treatment described in the text. The upper 
(squares), middle (circles) and bottom (triangles) plots refer 
to g = 0.82, q — 0.89 and q = 0.98, respectively. 



FIG. 3. The entropic index q versus jl. See the text for 
the definition of jl. The points with error bars are the result 
of the numerical treatment described in the text, and the line 
denotes the function q — {fl ~ 2)°' with a ~ 0.13. 
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